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The study of population genetics among the Bemisia tabaci complex is limited due to the lack of conserved 
molecular markers. In this study, 358, 433 and 322 new polynucleotide microsatellites are separately 
identified from the transcriptome sequences of three cryptic species of the B. tabaci complex. The cross 
species transferability of 57 microsatellites was then experimentally validated. The results indicate that these 
markers are conserved and have high inter- taxon transferability. Thirteen markers were employed to assess 
the genetic relationships among six cryptic species of the B. tabaci complex. To our surprise, the inferred 
phylogeny was consistent with that of mitochondrial COI sequences, indicating that microsatellites have the 
potential to distinguish species of the B. tabaci complex. Our results demonstrate that development of 
microsatellites from transcriptome data is a fast and cost-effective approach. These markers can be used to 
analyze the population genetics and evolutionary patterns of the B. tabaci complex. 



The whitefly Bemisia tabaci (Gennadius) (Hemiptera: Aleyrodidea) is a species complex containing at least 34 
cryptic species 1 " 4 . The species complex colonizes more than 600 different species of plants and causes signifi- 
cant damage through transmitting plant viruses and feeding on plant phloem sap 3 . These cryptic species are 
morphologically indistinguishable 5 and the mitochondrial cytochrome oxidase I (mtCOI) marker has been widely 
used to delimit different members of the complex 2,6 . To date, at least 12 distinct genetic groups have been identified 
from the complex based on mtCOI sequences 2,6 and all available mating studies are in favor of the species-level 
boundaries 3 . The 12 distinct groups relates to the break in divergence frequencies identified at around 12%. 
However, it is perhaps more important to consider that there are 4 major clusters that represent the complex: 
(1) SubSaharan Africa (the ancestral cluster); 2) Asia; 3) New World and 4) North Africa/Middle East/ Asia Minor. 

During the last twenty years, the Middle East Asia Minor 1 (MEAM1) and Mediterranean (MED) cryptic 
species of the complex have invaded many countries around the world and the invasion of MEAM 1 and MED are 
associated with the displacement of closely related members of the complex 7,8 . Numerous efforts have been made 
to reveal the possible factors responsible for the invasion of MEAM1 and MED whiteflies. However, because the 
species of the B. tabaci complex are morphologically indistinguishable, the evolution of the complex and the 
migration and displacement process of MEAM1 and MED invasion are hard to trace. 

Previously, various genetic markers have been used to study the genetic diversity/structures of different cryptic 
species of the B. tabaci complex such as the random amplification of polymorphic DNA (RAPD) PCR 9 , amplified 
fragment length polymorphisms (AFLP) 10 , restriction fragment length polymorphism (RFLP) 11 , mitochondrial 
DNA 6 , ribosomal ITS1 12 and microsatellite markers 1316 . Among these genetic markers, microsatellites, or simple 
sequence repeats (SSRs), are randomly repeated motifs of DNA composed of 1-6 base pair (bp) long units 17 , 
which can be highly polymorphic among populations and are valuable for linkage mapping, comparative geno- 
mics and gene-based association studies 18 . In addition, microsatellites are also indispensable tools that can be used 
to reconstruct invasion histories and colonization routes and to reveal population bottlenecks and regional 
dispersal patterns 19 . Owing to these advantages, microsatellite has become increasingly popular for analyses of 
population genetics and evolutionary mechanisms of pest invasions 20 . 
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To date, 54 microsatellite markers are available for B. tabaci 14,21 ' 2 *. 
However, all of these microsatellites were derived from the genomic 
DNA and the connections between these markers and gene functions 
are completely unknown. Expressed Sequence Tag (EST) and tran- 
scriptome sequences contain polymorphic genetic markers and can 
be used to identify microsatellites 24 . Compared to the genomic DNA- 
derived microsatellites, EST- and transcriptome-derived microsatel- 
lites lack introns and intragenic regions and can correspond to genes 
with known or predicted functions 25 . In addition, those microsatel- 
lites have fewer null alleles and stutter bands 26 and have more po- 
tential statistical power in multiple comparisons 27 . Furthermore, 
EST- and transcriptome-derived microsatellites have high degree of 
transferability across species 28 , and can be used in closely relative spe- 
cies 29 . Therefore, systematical investigation of EST- and transcriptome- 
derived microsatellites will facilitate evolutionary and comparative 
studies in the B. tabaci complex composed of closely related cryptic 
species 27 . 

Recently, the transcriptomes of two invasive (MEAM1, MED) and 
one indigenous whitefly species (Asia II 3) have been sequenced 30,31 . 
These studies have generated a tremendous amount of data and 
provided a valuable source for the identification of microsatellite 
markers in whiteflies. The first objective of this study is to identify 
microsatellites from the three transcriptomes. In addition, microsa- 
tellites located in different regions of a gene serve various functions 32 . 
The distribution of microsatellites on genes was also analyzed. 
Furthermore, PCR experiments were employed to verify these pre- 
dicted microsatellites and their cross species transferability. By com- 
parative analysis of the newly developed microsatellites, the genetic 
relationships of six B. tabaci species were revealed. This study pro- 
vides a rich resource of microsatellites for the B. tabaci complex and 
will facilitate researches on whitefly genetic diversity and evolution. 

Results 

Identification of microsatellites from the B. tabaci transcriptome 
databases. A total of 27.653 Mbp, 44.937 Mbp and 24.468 Mbp of 
sequences from the MEAM1, MED and Asia II 3 transcriptomes 
were used for mining microsatellites with the MISA-Micro Sate- 
llite program 33 (Table 1). There were 6419, 11711 and 4115 
microsatellites in MEAM1, MED and Asia II 3 respectively (Table 
SI), which correspond to one microsatellite per 3.837 ~ 5.946 Kbp of 
transcriptome sequences. The total numbers of polynucleotide 
repeats were 358, 433 and 322 in MEAM1, MED and Asia II 3 
respectively. While most microsatellites-containing unigenes have 
only one microsatellite, there are 362, 299 and 190 unigenes 
containing multiple microsatellites. In addition, 277, 367 and 193 
unigenes contain compound microsatellites in MEAM1, MED and 
Asia II 3, respectively (Table 1). 

Of the characterized microsatellites, mononucleotide repeats were 
the most common, followed by dinucleotide, trinucleotide and tetra- 
nucleotide repeats (Fig. 1A). On a complementary strand, a polyA 
repeat is the same as a polyT repeat. Similarly, in different reading 
frames or on a complementary strand, (AC)n is the same as (CA)n, 
(TG)n and (GT)n, while (AAG)n is the same as (AGA)n, (GAA)n, 
(CTT)n, (TTC)n, and (TCT)n. Thus, mononucleotide, dinucleotide 



and trinucleotide repeats can be grouped into 2, 4 and 10 unique 
classes respectively 34 . In the three species of the B. tabaci complex, A/ 
T motifs were the most abundant in mononucleotide repeats, and 
AG class were the most common in dinucleotide repeats (Fig. IB). 
However, the usage of trinucleotide was different. In MEAM1, the 
AAG class was the most widespread followed by ATG and AAC class 
(Fig. 1C). In MED, the most prevalent three triplet codons were AAG 
class, whereas in Asia II 3, ATG is the most frequent class (Fig. 1C). A 
total of 45 tetra microsatellites were identified from both MEAM1 
and Asia II 3. Furthermore, two pentra motifs were found in Asia II 3 
and one hexa motif was found in MEAM1 and Asia II 3, respectively 
(Fig. 1A). 

Distribution of microsatellites in 3'UTR, 5'UTR and CDS regions. 

The distribution of polynucleotide microsatellites in CDS, 5'UTR 
and 3'UTR regions was investigated. Based on the information of 
BLASTx homology, the position of 71, 71 and 40 polynucleotide 
microsatellites were respectively determined for MEAM1, MED 
and Asia II 3 (Table 2 & Table S2). In the CDS, 45, 48 and 27 
microsatellites with polynucleotide repeats were found in MEAM1, 
MED and Asia II 3, respectively, which were significantly higher 
than that of UTRs (Table 2). Interestingly, the number of 
trinucleotide repeats in CDS region was also much higher than 
other types of microsatellites. The characteristics of the amino 
acids encoded by the trinucleotide repeats in CDS were then 
investigated. In MEAM1, MED and Asia II 3, a total of 37, 37 and 
20 triplet codons were found and they encoded 10, 15 and 8 different 
amino acids respectively (Fig. 2A). The codons encoded aromatic 
amino acids took up most of the partition, followed by aliphatic and 
heterocyclic amino acids (Fig. 2B). The codons encoded hydrophilic 
amino acids were 29, 20 and 17, while encoded hydrophobic amino 
acids were 3, 15 and 3 in MEAM1, MED and Asia II 3, respectively. 

Gene Ontology (GO) and KEGG annotation of microsatellite- 
containing sequences. GO assignments were used to classify the 
functions of the genes with microsatellites. Based on sequence 
homology, 272, 456 and 255 microsatellite-containing sequences 
from MEAM1, MED and Asia II 3 have GO annotations and can 
be categorized into 35 functional groups. The 35 functional groups 
were classified into three main categories (Biological process, 
Cellular component and Molecular function) (Fig. 3). GO analysis 
showed that 'Metabolic process', 'Cellular process' and 'Cell part' 
terms are dominant. Next, these genes were annotated to different 
KEGG pathways. A total of 318, 538, 297 microsatellite-containing 
sequences of MEAM1, MED and Asia II 3 were mapped to 193, 222 
and 210 KEGG pathways, respectively (Table S3). Some of these 
genes are related to resistance to environmental stresses and 
insecticides, such as aldehyde oxidase 35 , cytochrome P450 36 and 
mitogen-stress activated protein kinases 2 37 (Table S4). 

Characterization of the predicted microsatellite markers. To 

validate the predicted microsatellites, 88 primer pairs were 
synthesized (24 for MEAM1, 32 for MED and 32 for Asia II 3) to 
amplify microsatellites from whitefly DNA. Among the 88 primer 
pairs, 57 (65%) generated clear PCR products and 31 (35%) did not 



Table 1 | Frequency and distribution of microsatellites 


in three species of the 


8. tabaci complex 




Descriptions 


MEAM1 


MED 


Asia II 3 


Total number of sequences examined: 


57,741 


168,900 


52,535 


Total size of examined sequences (bp): 


27,653,107 


44,936,957 


24,468,191 


Total number of identified microsatellites: 


6,419 


11,711 


4,1 15 


Number of microsatellites containing sequences: 


6,057 


11,5 12 


3,925 


Number of polynucleotide microsatellites 


358 


433 


322 


Number of sequences with > 1 microsatellites: 


362 


299 


190 


Number of compound microsatellites: 


277 


367 


193 
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Figure 1 | Distribution of microsatellites in the three whitefly species. 

(A) Distribution of repeat loci. (B) Distribution of different dinucleotide 
repeats. (C) Distribution of different trinucleotide repeats. 

yield good amplification. The frequencies of microsatellite 
amplification in MEAM1, MED and Asia II 3 were 70.83% (24 
designed, 17 effective), 53.13% (32 designed, 17 effective) and 
71.88% (32 designed, 23 effective), respectively. The failure to 
amplify microsatellites may be caused by non-specific primers, the 
presence of large introns in the genomic DNA or inappropriate PCR 
conditions. Among the 57 markers that generated clear PCR 
products, 42 markers showed polymorphisms; whereas 15 
generated a single PCR product (monomorphism). Details (repeat 
motifs, PCR primers, allele sizes, gene ID, accession number and 
possible change of amino acids) of the 57 markers are shown in 
Table S5. 



Table 2 The numbers of polynucleotide microsatellites in 3'UTRs, 
5'UTRs and CDS 


Species/regions 


P2 


P3 


P4 


Total 


MbAM 1 










5'UTRs 


6 


1 


0 


7 


3'UTRs 


1 3 


6 


0 


1 9 


CDS 


8 


37 


0 


45 


MED 










5'UTRs 


6 


2 


0 


8 


3'UTRs 


13 


2 


0 


15 


CDS 


1 1 


37 


0 


48 


Asia II 3 










5'UTRs 


3 


1 


0 


4 


3'UTRs 


5 


4 


0 


9 


CDS 


6 


20 


1 


27 


P2, P3 represent the types of microsatellites. 
P2: dinucleotide; P3: trinucleotide. 



The cross-taxa transferability of microsatellites. Next, the cross- 
amplification of the 57 primer pairs were investigated by PCR and 
capillary electrophoresis 38 . The result showed that 42 (73.68%) 
amplified fragments from all of the three whitefly species (Table 
S6), suggesting that these microsatellites are highly conserved and 
may act as genetic markers for the B. tabaci complex. In addition, 8 
primer pairs could amplify fragments from the two invasive 
whiteflies and 1 pair amplified from both MED and Asia II 3, 
while the remaining 6 pairs of primers only amplified fragments 
from one of the three species. Next, the functions of these 42 
microsatellite-containing genes were determined through Blast 
search, of which 33 showed significant similarities to known genes 
(Table S5). The alleles of these 42 microsatellites were compared 
among MEAM1, MED and Asia II 3 whiteflies. Of the 42 
microsatellites, 26 contained the same alleles in MEAM1 and 
MED, whereas only 17 markers shared in ME AMI and Asia II 3 
and 16 markers shared in MED and Asia II 3 (Table S6), which is 
consistent with the fact that MEAM1 and MED have closer 
relationships among the three cryptic species. Interestingly, the 
sodium channel gene (Gene ID 99267), which is associated with 
xenobiotic resistance 39 , was found to contain different alleles in 
invasive and native whitefly species (Table S6). 

Characteristics of the 13 microsatellites in six species of the B. 
tabaci complex. Thirteen polymorphic microsatellites (Table S5 
marked in yellow) were then employed to assess the polymorphism 
and heterozygosity among six laboratory colonies of the B. tabaci 
complex (8 individuals for each species) (Table 3). A total of 93 
alleles were identified from laboratory colonies of the six species 
based on the 13 microsatellite markers. When compared the 
performance of 13 microsatellite markers independently among the 
6 cryptic species, the number of alleles (N A ) ranged from 3 to 15, with 
an average of 7.2 alleles per locus. The observed (Ho) and expected 
(H E ) heterozygosities ranged from 0.022 to 0.581 and 0.297 to 0.888 
respectively. When compared the performance of these loci among 
the 6 cryptic species, the China 1 cryptic species exhibited the highest 
N A (3.077), while MEAM1 and Asia II 7 displayed the lowest N A 
(2.231) (Table 3). The MEAM1 showed the lowest observed (Ho) 
and expected (H E ) heterozygosities. There are null alleles for two 
loci (291416, 27966) in MEAM1, two loci (31541, 22561) in MED, 
one locus (25869) in Asia II 3, three loci (102573, 29116, 99267) in 
Asia II 7 and two loci (36306, 102573) in Asia II 6. The Hardy- 
Weinberg equilibrium test was done for each locus in each 
population, 25 of 50 groups were significantly deviated from 
Hardy- Weinberg equilibrium (Table 3). Significant genotypic 
linkage disequilibrium was not detected. With regard to the 
polymorphism (PIC) of the 13 microsatellite-containing genes, the 
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Figure 2 | The characteristics of trinucleotide repeats in CDS region of three species. (A) Distribution of different amino acid codons in the three 
species. (B) Distribution of amino acid codons which encoded different types of amino acids in the three species. 



glycine-rich cell wall structural protein precursor (Gene ID: 29116), 
sodium channel (Gene ID: 99267) and NADH dehydrogenase subunit 
4L (Gene ID: 76476) had high polymorphism (Table 3). The CCAAT/ 
enhancer binding protein (Gene ID: 102573) showed relatively low 
polymorphism (Table 3). The polymorphism information content 
(PIC) ranged from 0.246 to 0.866, with an average of 0.612 in the 
six cryptic species, which indicates the effectiveness of microsatellites 
markers for detecting polymorphism 40 . In addition, the genetic 
diversity of the 13 microsatellite loci was compared among the six 
whitefly species. The level of gene diversity from the highest to the 
lowest was Asia II 3, MED, China 1, Asia II 6, ME AMI and Asia II 7 
(Table S7). Fig. 4 displays the gene diversity of the 13 microsatellite 
loci in every comparison. For example, in the comparison of MED 
and MEAM1, 7 of the 13 loci in MEAM1 showed a decrease of gene 
diversity compared to that of MED. To further testify these markers, 
4 microsatellites were randomly picked to sequence and analyze the 
alleles per locus among the six cryptic species (Fig. 5). Complex 
mutational patterns (single base mutation, change of repeat units and 
indels within flanking region) were observed in these transcriptome- 
derived microsatellites (Fig. 5), which is normal in insects 41 . 

The implications of microsatellites in evolutionary analysis. To 

date, at least 12 distinct genetic groups have been identified from the 



complex based on mtCOI sequences which consist of 4 major 
clusters that represent the complex: 1) SubSaharan Africa (the 
ancestral cluster); 2) Asia; 3) New World and 4) North Africa/ 
Middle East/ Asia Minor. The 6 species covered belong to 2 of the 4 
major clusters, i.e. Asia and North Africa/Middle East/ Asia Minor. 
The Neighbor Joining method was used for cluster analysis of the six 
species (Figure 6 A & B). Interestingly, the genetic relationship based 
on the 13 microsatellite loci is in agreement with the phylogeny of the 
partial mtCOI sequences 2 . Principal coordinates analysis (PCA) were 
done using GENALEX 6 software 42 and the results revealed that the 
China 1, Asia II 3, Asia II 6 and Asia II 7 clustered on the left quadrant 
of the plot (Figure 6 B & C). When considered the first and third 
factors, the results revealed clearly that there are three genetic groups 
among the 6 cryptic species. The PCA analysis supports the 
phylogenetic clusters. In conclusion, these microsatellites may be 
used as markers to describe the genetic diversity of the B. tabaci 
complex. 

Discussion 

To our knowledge, this is the first investigation of the frequency and 
distribution of B. tabaci microsatellites using transcriptome data 
(Table 1). We found that mononucleotide repeats were the most 
common, followed by dinucleotide repeats, trinucleotide and tetra- 
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Figure 3 | GO classifications of microsatellite-containing sequences from the three species. The right y-axis represents the number of microsatellite- 
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Table 3 | Continued 




Locus no. MEAM1 MED Asia II 3 


China 1 Asia II 7 Asia II 6 


Overall 


F, s 0.423 0.395 0.333 


0.489 0.132 0.306 




N A : Number of alleles delected; He: Expected heterozygosity; Ho: Observed heterozygosity; Fis: Inbreeding 

*P<0.05; 

"P < 0.01; 

***P < 0.001 . Non-significant P values are not indicated. PIC: Polymorphic index content. 


ndex; Hardy-Weinberg tests are indicated together with F\$ values. 





nucleotide repeats in all of the three species. These results confirm 
the theory that the microsatellite abundance decreases with the 
increase of motif repeat number and repeat length 43 . These micro- 
satellites provide a valuable resource for the development of genetic 
markers in B. tabaci. In addition, these markers offer a chance to 
classify the functions of these microsatellite-containing genes. 
Interestingly, some microsatellite-containing genes were found in 
pathways related to environmental stress responses (Table S4). 
These markers may open a new avenue for the research on the B. 
tabaci pesticide resistance by estimating the frequencies of alleles in 
genes related to resistance across populations 44 . 

Many studies have demonstrated that microsatellites derived from 
transcribed sequences harbor higher transferability because of their 
conservative features 45,46 . In this study, among the 57 microsatellite 
markers, 42 primer pairs (73.68%) could amplify fragments from the 



three species. These results illustrate that the transcriptome-derived 
markers have high inter taxon transferability 47 . Therefore, these mar- 
kers can be used to amplify microsatellites from the other closely 
related species that do not yet have markers and can be considered as 
anchor markers for comparative mapping and evolutionary studies 
across species 48 . 

For 13 microsatellites among the colonies of six species (MEAM1, 
MED, Asia II 3, Asia II 6, China 1 and Asia II 7) of the B. tabaci 
complex, the number of alleles (N A ) ranged from 3 to 15, with an 
average of 7.2 alleles per locus. Previously, Valle, Lourencao, Zucchi 
and Pinheiro 23 , Tsagkarakou and Roditakis 22 and De Barro, Scott, 
Graham, Lange and Schutze 21 found that the number of alleles per 
locus ranged from 1 to 2, 2 to 13 and 6 to 44, respectively. The 
difference in polymorphisms may be attributed to different popu- 
lation sizes or cryptic species used. Polymorphism and heterozygos- 





Asia II 3 Asia II 7 MEAM1 China 1 china 1 Asia II 7 China 1 Asia II 6 Asia II 7 Asia II 6 

Figure 4 | Comparison of gene diversity among the six cryptic species of the B. tabaci complex. Each point represents the gene diversity of one locus and 
the lines connect points from different species. 
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ACGGGTTTTATACTGGG 
ACGGGTTTTATACTGGG 
ACGGGTTTTATACTGGG 
ACGGGTTTTATACTGGG 
ACGGGTTTTATACTGGG 
ACGGGTTTTATACTGGG 



BT_Q_ZJU_Singletons29116 Repeat motif: TA 



ME AMI 
MED 

Asia II 3 
Chinal 
Asia II 6 
Asia II 7 



ATATATACTA 
ATATATACTA 
ATATATACTA 
ATATATACTA 
ATATATACTA 
ATATATACTA 



ATATATACTAtATATATA 




CAaAG GtAC ACGGGTTTTATACTGGGtGGGCCaa^AGTA" ' 



BT_ZHJl_ZJU_Unigene30306 Repeat motif: GA 



MEAM1 
MEDE 
Asia II 
Chinal 
Asia II 
Asia II 



20 




-AA 



GAGAGAGAGAAAjj 

JGAGA TAG 

AAjj 

jGAGA TH 



AGA AGAGA 




60 



tcttattttta 
tcItatttIta 
tcItatttIta 
tcItatttIta 
tcttattttta 
tc0tattt0ta 




tTGGTT ~tt "gtATSa" U" " "CaafitA" T " TtTCTATTTtTAt AA ACAG 



T AAAT GT aMtBtTGAAC CTCTAATCCAGTC 

taaatgtaMtIttgaacctctaatccagtc 
taaatgtaHtIttgaacctctaatccagtc 
taaatgtagctcttgaacctctaatccagtc 
taaatgta^tgttgaacctctaatccagtc 
taaatgtagctcttgaacctctaatccagtc 



BT_B_ZJU_Singletons98023 Repeat motif: 



MEAMl 
MED 

Asia II 3 
Chinal 
Asia II 6 
Asia II 7 



20 



TTTTGTAAGTGTTATTGTCTCTCTCTCTCT 
TTTTGTAAGTGTTATTGTCTCTCTCTCTCT 
TTTTGTAAGTGTTATTGTCTCTCTCTCTCT 
TTTTGTAAGTGTTATTGTCTCTCTCTCTCT 
TTTTGTAAGTGTTATTGTCTCTCTCTCTCT 
TTTTGTAAGTGTTATTGTCTCTCTCTCTCT 



TTTTGTAAGTGTTATTGTCTCTCTCTCTCT 




TAAATGTAgcTcTTGAACCTCTAATCCAGTC 



BT_B_ZJU_Singletons99267 Repeat motif: TG 



MEAMl 
MED 

Asia II 3 
Chinal 
Asia II 6 
Asia II 7 



A CEMGGTG 

AGTG 





t TgTTgaTTA"l;TYCAt r GGT;a'a , gAG" TGTgTGTgTGT aGGTTtATGGCAgaATTTcACGGT 



68 
62 
60 
63 
62 



Figure 5 | Mutational patterns of selected microsatellites in B. tabaci. The black lines indicate the repeat motifs and the black-dotted lines represent the 
indels in the flanking region. 



ity are critical characteristics of microsateilite markers. The high 
allele numbers and heterozygosity (Table 3) of these transcrip- 
tome-derived markers suggests that they can be used to assess the 
genetic profile of the B. tabaci complex. A total of 25 loci significantly 
deviated from Hardy- Weinberg equilibrium (Table 3). The source of 
deficiencies includes the recurrent inbreeding, subpopulation struc- 
ture (Wahlund effect) and/ or null alleles. The big differences between 
H a and H E also were found among the genomics-based microsatel- 
lites developed for B. tabaci by De Barro 49 . In addition, the marked 
differences between H a and H E also were observed in the genomics- 
based microsatellites of B. tabaci collected in different envir- 
onment 14,16 . Some loci did not have heterozygosity and the F !S values 
were all positive, suggesting that the loss of heterozygosity can hap- 
pen in some microsatellites. This is probably due to the fact that our 
samples were obtained from lab populations. Additional experi- 
ments with field populations are warranted to reveal the polymorph- 
ism and heterozygosity of different species of the B. tabaci complex. 

Due to the lack of morphological characteristics, the systematics 
of B. tabaci species complex depends largely on molecular 
approaches 24,6 . To date, the evolutionary history of B. tabaci complex 
was inferred exclusively from the partial sequence of COI gene 2,6,50 . 
Other molecular markers are in great need to further illustrate the 
complicated evolutionary relationships of the B. tabaci species com- 
plex. For the 4 major clusters that represent the B. tabaci species 
complex, the most divergent cluster is SubSaharan Africa followed 



by New World. We have used 1 1 newly developed microsatellites to 
study the SubSaharan Africa 1 and the results revealed that they can 
work on the cryptic species (data unpublished). Owing to the absence 
of the specimen, we did not study the New World, however, the most 
divergent cluster is SubSaharan Africa followed by New World 
among the 4 major clusters. Therefore, our newly developed micro- 
satellites are conserved and many of them can be used to analyze 
other cryptic species of the B. tabaci complex. The genetic relation- 
ship of 6 species derived by 13 novel conserved microsatellites mar- 
kers is in accordance with the phylogeny of the partial COI sequences. 
It provides additional evidence for the evolutionary relationship of 
the 6 cryptic species. In addition, our results also prove that micro- 
satellites can be considered as suitable markers for future evolutionary 
analysis in other species complex. 

Conclusion 

In this study, we characterize a large number of transcriptome- 
derived microsatellites from members of the B. tabaci complex. 
Functional categorization of these microsatellites-containing genes 
provides valuable information about the potential functions of these 
microsatellites. In addition, 57 microsateilite markers were experi- 
mentally validated. Many of these microsatellites can be used for 
population analysis and are transferable across different species of 
the B. tabaci complex. Complex mutational patterns were observed 
in these transcriptome-derived microsatellites. What's more, by ana- 
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CDS with unexpected stop codon in the Blast hit region were removed. Start codon 
positions were determined by examination of the in-frame ATG codon present 30 bp 
upstream or downstream of the beginning of the aligned reference protein. The stop 
codon positions were determined by examination of in-frame TAA, TAG and TGA 
motifs present within 30 bp of the stop codon of the reference protein. The 5' or 3'UTR 
regions were defined based on the CDS prediction. The locations of microsatellites were 
determined based on the predicted 5' UTR, 3'UTR and CDS regions. 

GO and KEGG pathway analysis. To understand their functions, all micros at eilite- 
containing genes were searched against the GenBank nr protein database using 
BLASTx with an E-value cut-off of 10" 5 . Blast2GO software was used to assign the 
Gene Ontology (GO) terms to these micros atellite- containing genes 51 . Blastall 
software was employed to perform the pathway analysis by searching all genes against 
the KEGG database. 
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Figure 6 | Phylogenetic and Principle components analysis of the six 
cryptic species. (A) Phylogeny of the six cryptic species. POPTREE2 
software 59 were used to construct the phylogenetic tree and the numbers on 
the node represent bootstrap support. The scale bar represent Nei's 
standard genetic distance D 60 . (B) Two-dimensional scatter plot of the first 
and second factors for 6 cryptic species. The first and second PCs account 
for 60.21% and 20.52% of the variation, respectively. (C) Two- 
dimensional scatter plot of the first and third factors for 6 cryptic species. 
The first and third PCs summarize 60.21% and 13.75% of the variation, 
respectively. 



Sample collection and DNA extraction. The invasive MEAM1 (mtCOI GeneBank 
accession number: GQ332577), MED (KF452516), the indigenous Asia II 3 
(KF452527), Asia II 6 (KC540758), China 1 (KF452525) and Asia II 7 (EU192043) 
species of the B. tabaci complex were collected from Zhejiang, China. These cryptic 
species were maintained separately on cotton (cultivar Zhe-Mian 1793) with the 
following controlled conditions: 27 ± 1 D C, a photoperiod of 14 h light: 10 h darkness 
and relative humidity of 70 ± 10% 7 . Total DNA was extracted from individual female 
adult whiteflies following the method of Frohlich et aP. The purity of the populations 
was identified by PCR amplification of a 0.7 kb fragment of mtCOI gene. The 
following forward and reverse primers were used to amplify the partial mtCOI 
sequences (5'-TGRTTYTTTGGTCATCCVGAAGT-3' and, 5'- 
TTACTGCACTTTCTGCCACATTAG-3 ' ) . 

Validation of microsatellite markers using PCR. To test these markers, Primer 
Premier 5.0 53 was used to design PCR primers from the sequences flanking the 
microsatellites. Bemisia tabaci produces males from unfertilized eggs and females 
from fertilized eggs. Therefore, only females were used to test polymorphism at each 
locus. M13-specific primers (5 ' - CACGACGTTGT AAAACGAC- 3 ' ) with a 
fluorescent dye (FAM or HEX, Applied Biosystems) were added to the 5' -end of each 
forward primer 38 . PCR was carried out in an S1000 thermal cycler (Bio-Rad). A 15 uL 
PCR reaction contained 0.25 uL 100 pmol/ul forward primer, 0.25 uL 100 pmol/uL 
reverse primer, 0.25 uL 100 pmol/uL5'dye-labelledM13 primer, and 1.6 uL 10 X Ex 
Taq Buffer, 1.2 uL 2.5 mM dNTP and 0.1 uL Ex Taq polymerase (Takara, fapan). 
PCR cycling conditions were 94°C for 3 min, followed by 32 cycles of 96 C for 15 s, 
5 1-63 L C for 20 s, 72 ,J C for 50 s. The PCR reaction products were diluted and detected 
on a MegaBACE 1000 DNA analysis system (Amersham Biosciences) at the Center of 
Analysis and Measurement in Zhejiang University. The ET550-R size standard (GE 
Healthcare) and Genetic Profiler version 2.2 (GE Healthcare) were used to judge the 
sizes of amplification. 

Polymorphism and microsatellite distribution analysis. Software POPGENE 
(version 1.31) 54 was used to calculate the total number of polymorphic alleles (N), 
average number of alleles per locus (N A ), average number of effective alleles per locus 
(Ne), observed heterozygosity (Ho), expected heterozygosity (H E ), the genetic 
identity (I), genetic distance (D) and the gene diversity for each loci of different 
cryptic species. FSTAT (Version 1.2) 55 was used to examine the allelic richness (R). 
The inbreeding index (F IS ) and p (F ls ) was estimated by GENEPOP 4.0 56 . 
Polymorphism information content (PIC) was calculated by PIC-CALC version 0.6 57 . 
The null alleles and technical artifacts like stuttering and large allele dropout was 
assessed using MICRO-CHEKER v.2.2.3 58 . 

Data accessibility. DNA sequences: Sequences have been submitted to the GenBank 
with the accession number of: KF916587-KF916610. Detailed information of 
predicted microsatellites from the transcriptomes of the three species is shown in 
Supplementary Table SI. PCR primer sequences are presented in Supplementary 
Table S5. 



lyzing the genetic relationships of six B. tabaci species basing on 
newly developed microsatellites, the results indicated that these 
microsatellites maybe used as markers to describe the genetic divers- 
ity of the B. tabaci complex. These markers enrich the existing micro- 
satellite markers of B. tabaci and can be used to analyze the genetic 
diversity and evolutionary pattern of this whitefly complex. 

Methods 

Mining the microsatellites from the transcriptome database. The MISA-Micro 
Satellite identification tool (http://pgrc.ipk-gatersleben.de/misa/) was used to search 
for microsatellites in the transcriptome data of MEAM1, MED and Asia II 3 
whiteflies. Microsatellites were defined as being mononucleotide repeats >— 10 
repeats and di-, tri-, tetra-, penta- and hexanucleotide repeats > — 6 repeats 33 . Criteria 
for compound microsatellites was an interval of bases <— 100 of the motif length. 

The location of microsatellites. Coding sequences (CDS) of each gene were first 
determined by BLASTx against the Swissprot database using a threshold of 1 X 10~ 5 . 
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